2024-08-01 10:31:36.AIbase.10.7k
Unlocking the 'Black Box' of Language Models! Google DeepMind Releases a Visualization Tool Gemma Scope
Google DeepMind's latest research, Gemma Scope, unveils the secrets of the 'black box' of language models by using Sparse Autoencoders (SAEs) to decompose and reconstruct the activations of language models, aiming to reveal meaningful features. Gemma Scope employs JumpReLU SAEs, which optimize reconstruction loss and regularize the number of active latent features by controlling activations to reveal the internal mechanisms of language models. The research found that the performance of residual flow SAEs is generally low, and sequence length affects the performance of SAEs.